Probability Basics

Joint distribution

The joint distribution of two random variables X and Y:

p(x,y)=p(X=x,Y=y)

Marginal distribution

The Marginal distribution of X is summing over all possible states of Y:

p(X=x)=yp(X=x,Y=y)

This is also called Sum rule / The rule of total probability.

Conditional joint

p(X,Y|Z)=p(X|Z)p(Y|Z)

Product rule

p(A,B)=p(A|B)p(B)=p(B|A)p(A)

Chain rule

p(A1,A2,A3,...,An)=p(A1)×p(A2|A1)×p(A3|A1,A2)×...×p(An|A1,A2,...,An1)

Note that Markov Chain can be regarded as a simplified version of the chain rule, i.e. predict by ignoring all previous events but the last one(s), so that p(An|A1,..An1)p(An|An1)

p(A|B,C)p(B|C)=p(B|A,C)p(A|C)

derivation:

p(A|B,C)=p(A,B,C)/p(B,C)

with

p(A,B,C)=p(B|A,C)p(A|C)p(C),  p(B,C)=p(B|C)p(C)